ci(bench): add per-PR perf canary for extractor/graph/native changes by carlos-alm · Pull Request #1488 · optave/ops-codegraph-tool

carlos-alm · 2026-06-13T00:51:00Z

Summary

Adds .github/workflows/perf-canary.yml — a path-filtered workflow that fires only on PRs touching src/extractors/**, src/domain/graph/**, crates/**, or the benchmark scripts themselves
Runs the incremental-benchmark suite only (full build + no-op + 1-file rebuild, both engines) — completes in ~5–10 minutes vs 20–60 for the full suite
Gates on 50% regression threshold via new BENCH_CANARY=1 mode in the regression guard

Prevents the class of regressions that accumulated invisibly across the Phase 8.x PRs and were only caught at v3.12.0 publish time (+98% native full build, +1827% native 1-file rebuild). The canary would have caught both within 10 minutes of the first regressing PR landing.

Changes:

tests/benchmarks/regression-guard.test.ts — adds BENCH_CANARY mode:

Raises thresholds: 50% standard, 100% noisy-metrics, 150% WASM timing
Skips build/query/resolution describe blocks (only incremental runs)
Skips the stale-KNOWN_REGRESSIONS check (irrelevant for a build gate)

.github/workflows/perf-canary.yml — new workflow:

Triggers on pull_request with path filters for extractors/graph/crates
Builds the native addon from source (same as ci.yml)
Runs incremental-benchmark.ts --dist, updates the incremental report, then invokes the regression guard with BENCH_CANARY=1

Closes #1433

…actual phases

… diff Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh (PostToolUse Bash): the pre-hook captures git status --porcelain to a per-worktree temp file before each Bash call; the post-hook diffs the before/after state and appends newly modified or created files to .claude/session-edits.log. This closes the gap where files written by sed -i, printf redirects, tee, heredocs, or build tools (Cargo.lock, lockfiles) were never recorded, causing guard-git.sh to emit false-positive BLOCKED errors. Closes #1457

- clojure.rs: annotate lifetime-anchor assignment to silence false-positive - cfg.rs: remove never-called start_line_of method - complexity.rs: remove never-constructed NotHandled variant; convert irrefutable if-let patterns to plain let destructures - dataflow.rs: remove never-read callee fields from CallReturn/Destructured - incremental.rs: remove never-read lang field from CacheEntry cargo check and cargo clippy both clean after these changes.

…x struct

…e engine

…remental tiers

Adds .github/workflows/perf-canary.yml — a path-filtered workflow that fires on PRs touching src/extractors/, src/domain/graph/, or crates/** and runs only the incremental-benchmark suite (full build + no-op + 1-file rebuild, both engines). Catches the class of regressions that accumulated invisibly across the Phase 8.x PRs and were only detected at v3.12.0 publish time. The regression guard gains BENCH_CANARY=1 mode: raises thresholds to 50%/100%/150% (standard/noisy/WASM) and skips the build, query, and resolution suites — only incremental checks run. This absorbs shared- runner timing variance while still blocking catastrophic regressions (+98% full build, +1827% 1-file rebuild from v3.12.0). Closes #1433

claude · 2026-06-13T00:51:14Z

Claude finished @carlos-alm's task —— View job

PR Review: ci(bench): add per-PR perf canary for extractor/graph/native changes

Root Cause Analysis

Problem Being Solved: Performance regressions in the extractor/native layers accumulated undetected across the Phase 8.x PR series and were only caught at v3.12.0 publish time (+98% native full build, +1827% native 1-file rebuild). The full pre-publish benchmark suite (20–60 min) is too slow to gate individual PRs. This PR adds a lightweight, path-filtered canary that runs only the incremental benchmark tier (5–10 min) on PRs that touch the relevant layers.

Why This Approach: Targeted path filters + reduced benchmark scope is the correct design — run expensive checks only where regressions are most likely, keep the signal tight. The BENCH_CANARY mode in the regression guard is a clean toggle that reuses the existing guard infrastructure rather than duplicating logic.

Risk Assessment: Low. Thresholds are raised, not lowered — this is explicitly a coarser gate that accepts more CI-noise in exchange for fast feedback, not a mechanism to hide real regressions. The safety argument (native engine catches real regressions; WASM only needs gross-catastrophe detection) is correctly stated in the comments.

Issues Found

1. `has resolve data to compare` assertion runs in canary mode — will fail on a fresh runner (Medium)

tests/benchmarks/regression-guard.test.ts:827-832

test('has resolve data to compare', () => {
  expect(
    resolvePair != null,
    'No import-resolution benchmark data with ≥2 comparable entries',
  ).toBe(true);
});

This test is inside describe('incremental benchmarks') which is not skipped in canary mode. But resolvePair is derived from incrementalHistory with a .filter(e => e.resolve != null) — and the canary workflow writes a fresh incremental-canary-result.json that contains exactly one dev entry. If there is no prior dev-or-release entry with resolve data in the committed history file, resolvePair will be null and this assertion will fail in canary mode.

Similarly, test('has at least one engine to compare') at line 818 makes the same assumption that there are ≥2 incremental entries, which is guaranteed in committed history but only if the dev entry isn't the first. Worth auditing whether the canary's updated report always has a second comparable entry before this gate runs.

2. Missing `node-version` pin — `node: "22"` versus `node: 22` inconsistency (Minor)

perf-canary.yml:46

node-version: "22"

ci.yml uses node-version: 22 (integer, no quotes). This is functionally equivalent with actions/setup-node, but inconsistent with the rest of the repo. Minor, but worth aligning for readability.

3. `actions/checkout@v6` with `fetch-depth: 0` — not needed for this workflow (Minor)

perf-canary.yml:40-42

- uses: actions/checkout@v6
  with:
    fetch-depth: 0

fetch-depth: 0 fetches the full history. The canary doesn't need git history for comparisons — it compares against committed benchmark data in the repo, not against git refs. ci.yml and benchmark.yml use the default shallow clone. This wastes time on large repos and is unnecessary. The fetch-depth: 0 was likely copied from a workflow that does git log or branch comparisons; it should be removed here.

4. `update-incremental-report` step runs before regression guard — creates a misleading artifact on failure (Minor)

perf-canary.yml:94-103

- name: Update incremental report
  run: |
    ...
    node $STRIP_FLAG scripts/update-incremental-report.ts incremental-canary-result.json

- name: Regression guard (50% threshold)
  env:
    RUN_REGRESSION_GUARD: "1"
    BENCH_CANARY: "1"
  run: npm run test:regression-guard

The report is updated (and presumably written to disk) before the regression guard runs. If the canary guard fails, the updated report file with the regressed dev entry is left behind. Since the artifact is always uploaded (if: always()), this is fine for debugging. But if a subsequent step (or a concurrent job) picked up that file, it could persist bad data. For a pure CI artifact this is fine — just worth noting it's not committed, so there's no persistence risk.

5. No `permissions` block — relies on workflow-level defaults (Informational)

perf-canary.yml — entire file

The workflow has no explicit permissions stanza. GitHub's default for pull_request events from forks is read-only for most tokens, so this likely works fine. However, ci.yml also omits explicit permissions, so this is consistent with the repo convention. Just noting it for awareness.

Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies. Uses existing Rust/napi-rs/vitest infrastructure.
Foundation-aligned: ✓ — Pure CI infrastructure. No cloud API calls, no new runtime toolchain. Doesn't modify the programmatic API.
Problem-fit: 3/5 — Correct and necessary CI infrastructure, but doesn't directly address the "lost AI" navigation problem. It's maintenance/quality work. Per the backlog framing: "primarily optimizes for CI/CD" is deprioritized by P8, but preventing a +1827% regression from shipping is a real reliability concern. Call it 3 on the 1–5 scale — useful, not an agent-accuracy feature.
Breaking: No — purely additive CI workflow + test flag.
Tier: 1 (zero-dep ✓, foundation-aligned ✓)

Critical Concerns

The has resolve data to compare test (issue #1 above) is the only one that could cause a spurious canary failure. The fix is straightforward: add test.skipIf(BENCH_CANARY)(...) around that assertion (since the canary's incremental run may not produce resolve data), matching the pattern used for has at least one engine to compare... actually, wait — that test is also not skipped. Both sentinel tests inside describe('incremental benchmarks') assume committed history has ≥2 comparable entries. If the canary writes the first dev entry to a fresh report file (not to the committed one), the comparison data exists in committed history and both tests pass. Verify the update step appends to the committed INCREMENTAL-BENCHMARKS.md, not to incremental-canary-result.json. If it does, the concern is mitigated.

Final Recommendation

Rating: ⭐⭐⭐⭐ (4/5)
Action: APPROVE WITH CONDITIONS
Reasoning: The PR solves a real, documented problem with a well-scoped design. The BENCH_CANARY flag is clean, the threshold adjustments are thoroughly documented with empirical data (CI run numbers, percentage swings, root-cause attribution), and the workflow structure follows established repo conventions. The one actionable issue is the fetch-depth: 0 (remove it — it's wasteful). The has resolve data to compare concern should be confirmed to be safe before merge (i.e., verify the update script appends to committed history, not just the canary JSON). Everything else is minor polish.

Suggested fix before merging:

Remove fetch-depth: 0 from the checkout step in perf-canary.yml — not needed.
Confirm scripts/update-incremental-report.ts appends the dev entry to generated/benchmarks/INCREMENTAL-BENCHMARKS.md (committed file), so findLatestPair has ≥2 entries when the regression guard runs. If it doesn't, the has resolve data to compare and has at least one engine to compare tests will fail on a first-run canary.

greptile-apps · 2026-06-13T00:55:23Z

Greptile Summary

Adds a lightweight per-PR performance canary workflow (.github/workflows/perf-canary.yml) that fires only on changes to the extractor, graph, and native crate layers, and extends regression-guard.test.ts with a BENCH_CANARY=1 mode that runs only incremental benchmarks at relaxed thresholds (50% standard, 100% noisy, 150% WASM).

New workflow builds the native addon from source, runs incremental-benchmark.ts --dist, updates the incremental report, and invokes the regression guard with BENCH_CANARY=1; path filters (including regression-guard.test.ts and update-incremental-report.ts) ensure the canary itself triggers the gate when threshold logic changes.
BENCH_CANARY mode in the test file uses describe.skipIf / test.skipIf to skip build, query, resolution, and stale-exemption checks, while keeping all incremental-benchmark comparisons (plus the 'has at least one engine to compare' and 'has resolve data to compare' guard tests) active so the canary cannot silently pass with no data.

Confidence Score: 5/5

Safe to merge — adds a new workflow and expands a test file with no changes to production code paths.

Both changed files are additive: a new workflow that only runs benchmarks and a test file extension that gates on an environment variable. The threshold values and skip logic are well-documented, internally consistent, and validated by guard tests that prevent silent no-op passes. Action versions and workflow conventions are consistent with the rest of the repo. Previous review feedback has been addressed.

No files require special attention.

Important Files Changed

Filename	Overview
.github/workflows/perf-canary.yml	New workflow: path-filtered PR gate that builds native addon, runs incremental benchmark, and invokes regression guard in BENCH_CANARY mode; consistent with repo conventions (action versions, concurrency, permissions: {}, retry loop).
tests/benchmarks/regression-guard.test.ts	Adds BENCH_CANARY constant and threshold overrides (0.5/1.0/1.5), skips non-incremental describe blocks and maintenance checks in canary mode; incremental benchmarks block and its guard tests remain unskipped as intended.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    PR[Pull Request push] --> PF{Path filter matches?\nsrc/extractors/**\nsrc/domain/graph/**\ncrates/**\nscripts/benchmark*.ts\ntests/benchmarks/regression-guard.test.ts}
    PF -->|No| SKIP[Workflow skipped]
    PF -->|Yes| CHECKOUT[checkout + setup Node 22 + Rust]
    CHECKOUT --> BUILD_NATIVE[napi build --release\ncrates/codegraph-core]
    BUILD_NATIVE --> NPM[npm install with retry loop]
    NPM --> CI_NATIVE[ci-install-native.mjs\noverlay local addon over published binary]
    CI_NATIVE --> TS_BUILD[npm run build\nbuild TypeScript dist/]
    TS_BUILD --> BENCH[incremental-benchmark.ts --version dev --dist\n→ incremental-canary-result.json]
    BENCH --> UPDATE[update-incremental-report.ts\nmutates INCREMENTAL-BENCHMARKS.md locally]
    UPDATE --> GUARD{npm run test:regression-guard\nRUN_REGRESSION_GUARD=1\nBENCH_CANARY=1}
    GUARD -->|native Full build >50%\nor 1-file rebuild >100%| FAIL[❌ Job fails — regression detected]
    GUARD -->|WASM timing >150%| FAIL
    GUARD -->|All within thresholds| PASS[✅ Job passes]
    PASS --> UPLOAD[Upload incremental-canary-result.json artifact]
    FAIL --> UPLOAD

_{Reviews (4): Last reviewed commit: "fix: resolve merge conflicts with main" | Re-trigger Greptile}

greptile-apps · 2026-06-13T00:55:27Z

+concurrency:
+  group: perf-canary-${{ github.ref }}
+  cancel-in-progress: true


The workflow is missing an explicit permissions: block. Other workflows in this repo (e.g. benchmark.yml) use permissions: {} to lock down the token scope. Without an explicit declaration the job inherits the repo default, which may be broader than needed for a read-only benchmark run that only writes an artifact.

Suggested change

concurrency:

group: perf-canary-${{ github.ref }}

cancel-in-progress: true

permissions: {}

concurrency:

group: perf-canary-${{ github.ref }}

cancel-in-progress: true

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fixed — added permissions: {} at workflow level (above concurrency:) to lock down the token scope, consistent with benchmark.yml and publish.yml.

greptile-apps · 2026-06-13T00:55:28Z

+    paths:
+      - "src/extractors/**"
+      - "src/domain/graph/**"
+      - "crates/**"
+      - "scripts/benchmark.ts"
+      - "scripts/incremental-benchmark.ts"
+      - "scripts/lib/bench-config.ts"
+      - "scripts/lib/fork-engine.ts"


The path filter doesn't include tests/benchmarks/regression-guard.test.ts or scripts/update-incremental-report.ts. This very PR modifies the former to introduce BENCH_CANARY mode, but the canary won't fire for that change because no src/extractors/** / crates/** paths were touched. A future PR that adjusts the canary thresholds or the report-update logic will silently bypass this gate: the full pre-publish-benchmark runs but without BENCH_CANARY=1, so the BENCH_CANARY code path is never exercised in CI.

Suggested change

paths:

- "src/extractors/**"

- "src/domain/graph/**"

- "crates/**"

- "scripts/benchmark.ts"

- "scripts/incremental-benchmark.ts"

- "scripts/lib/bench-config.ts"

- "scripts/lib/fork-engine.ts"

paths:

- "src/extractors/**"

- "src/domain/graph/**"

- "crates/**"

- "scripts/benchmark.ts"

- "scripts/incremental-benchmark.ts"

- "scripts/lib/bench-config.ts"

- "scripts/lib/fork-engine.ts"

- "scripts/update-incremental-report.ts"

- "tests/benchmarks/regression-guard.test.ts"

Fixed — added both scripts/update-incremental-report.ts and tests/benchmarks/regression-guard.test.ts to the path filter. PRs that only touch the canary infrastructure will now correctly trigger the canary run.

greptile-apps · 2026-06-13T00:55:29Z

+      - name: Regression guard (50% threshold)
+        env:
+          RUN_REGRESSION_GUARD: "1"
+          BENCH_CANARY: "1"
+        run: npm run test:regression-guard


Canary must be a required status check to act as a gate

The workflow is a separate file and is not included in ci.yml's ci-pipeline aggregator (needs: [lint, native-host-build, test, ...]). For it to actually block merges on regressing PRs, perf-canary / Perf canary (incremental tiers) needs to be added as a required status check in branch protection rules. Without that step the canary is advisory only — a failing run won't prevent the PR from being merged.

Acknowledged — making the canary a required status check in branch protection rules is a repo settings change, not a code change. Filed as a follow-up action for the repo admin to add perf-canary / Perf canary (incremental tiers) to required status checks after this PR merges.

… excess fetch-depth - Add `permissions: {}` to lock down token scope for the read-only canary - Add `scripts/update-incremental-report.ts` and `tests/benchmarks/regression-guard.test.ts` to path filter so PRs that modify the canary machinery itself also trigger the canary - Remove `fetch-depth: 0` (full history not needed; canary compares against committed benchmark data, not git refs) - Align `node-version: 22` with the integer format used in ci.yml

carlos-alm · 2026-06-13T01:46:05Z

Addressed all review feedback:

permissions block (Greptile P2): added permissions: {} at workflow level
path filter gaps (Greptile P2): added scripts/update-incremental-report.ts and tests/benchmarks/regression-guard.test.ts so canary triggers on changes to its own machinery
fetch-depth: 0 (Claude review): removed — not needed since canary compares against committed benchmark data, not git history
node-version: "22" (Claude review): aligned to integer 22 matching ci.yml convention
required status check (Greptile P2): acknowledged as a repo settings action after merge; not a code change

carlos-alm · 2026-06-13T01:46:21Z

@greptileai

carlos-alm added 9 commits June 12, 2026 18:31

chore: gitignore napi-generated artifacts in crates/codegraph-core

3892e7d

chore(tests): remove unused biome suppression in visitor.test.ts

ef8ea4f

fix(titan-run): sync --start-from enum and phase-timestamp list with …

a372b82

…actual phases

refactor(native): extract emit_pts_alias_edges params into PtsAliasCt…

184d221

…x struct

fix(wasm): sort call targets by confidence before emit to match nativ…

909e1df

…e engine

fix(bench): add 2 warmup runs and raise INCREMENTAL_RUNS to 5 for inc…

66fc899

…remental tiers

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

Base automatically changed from fix/benchmark-warmup-1440 to main June 13, 2026 09:14

fix: resolve merge conflicts with main

ffd2013

carlos-alm merged commit 65bd28a into main Jun 13, 2026
23 checks passed

carlos-alm deleted the fix/bench-pr-canary-1433 branch June 13, 2026 21:05

github-actions Bot locked and limited conversation to collaborators Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(bench): add per-PR perf canary for extractor/graph/native changes#1488

ci(bench): add per-PR perf canary for extractor/graph/native changes#1488
carlos-alm merged 11 commits into
mainfrom
fix/bench-pr-canary-1433

carlos-alm commented Jun 13, 2026

Uh oh!

claude Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

carlos-alm Jun 13, 2026

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

carlos-alm Jun 13, 2026

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

carlos-alm Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Jun 13, 2026

Summary

Uh oh!

claude Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: ci(bench): add per-PR perf canary for extractor/graph/native changes

Root Cause Analysis

Issues Found

1. has resolve data to compare assertion runs in canary mode — will fail on a fresh runner (Medium)

2. Missing node-version pin — node: "22" versus node: 22 inconsistency (Minor)

3. actions/checkout@v6 with fetch-depth: 0 — not needed for this workflow (Minor)

4. update-incremental-report step runs before regression guard — creates a misleading artifact on failure (Minor)

5. No permissions block — relies on workflow-level defaults (Informational)

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

carlos-alm commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 13, 2026 •

edited

Loading

1. `has resolve data to compare` assertion runs in canary mode — will fail on a fresh runner (Medium)

2. Missing `node-version` pin — `node: "22"` versus `node: 22` inconsistency (Minor)

3. `actions/checkout@v6` with `fetch-depth: 0` — not needed for this workflow (Minor)

4. `update-incremental-report` step runs before regression guard — creates a misleading artifact on failure (Minor)

5. No `permissions` block — relies on workflow-level defaults (Informational)

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading